On the Stratification of Multi-label Data

نویسندگان

  • Konstantinos Sechidis
  • Grigorios Tsoumakas
  • Ioannis P. Vlahavas
چکیده

Stratified sampling is a sampling method that takes into account the existence of disjoint groups within a population and produces samples where the proportion of these groups is maintained. In single-label classification tasks, groups are differentiated based on the value of the target variable. In multi-label learning tasks, however, where there are multiple target variables, it is not clear how stratified sampling could/should be performed. This paper investigates stratification in the multi-label data context. It considers two stratification methods for multi-label data and empirically compares them along with random sampling on a number of datasets and based on a number of evaluation criteria. The results reveal some interesting conclusions with respect to the utility of each method for particular types of multi-label datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection

Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...

متن کامل

Exploiting Associations between Class Labels in Multi-label Classification

Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...

متن کامل

A Network Perspective on Stratification of Multi-Label Data

In the recent years, we have witnessed the development of multi-label classification methods which utilize the structure of the label space in a divide and conquer approach to improve classification performance and allow large data sets to be classified efficiently. Yet most of the available data sets have been provided in train/test splits that did not account for maintaining a distribution of...

متن کامل

A Survey of Social Factors Influencing Social Consensus(Case Study: Bushehr Civic Families)

The aim of this research is to study social factors influencing on social consensus. Sampling method was multi-process and included cluster and multistage sampling and sample size based on Cochran's Formula was 380 persons too. Data collection tools was questionnaire. In this research, the methods of data analysis were independent T-Test, Spearman Correlation Coefficient, Multivariate Regressio...

متن کامل

Application of pH Indicator Label Based on Beetroot Color for Determination of Milk Freshness

Introduction: Applying of a new indicator in food packaging can be effective to inform consumers about the freshness and quality of the products. Materials and Methods: In the current study, a new milk freshness label was investigated containing beetroot color and multi layers of polystyrene. The label characteristics were investigated by estimating color number, release test, and scanning ele...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011